Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Still working on it. |
| embeddings = self.padding(embeddings) | ||
| embeddings = self.convolution(embeddings, training=training) | ||
| embeddings = self.batch_norm(embeddings, training=training) | ||
| # embeddings shape = (bsz, height, width, num_channels) |
There was a problem hiding this comment.
Won't it be a channels-first memory layout after the transpose?
There was a problem hiding this comment.
The embeddings that come in are channel-first. I use the first transpose operation to make the embeddings channel-last. This helps with the padding, conv and batch_norm. After applying the operations I also use the next transpose to make the computeed embeddings channel-first again.
Am I missing something here?
There was a problem hiding this comment.
Got it. I thought you were referring to the shape after the transpose.
| embeddings = self.padding(embeddings) | ||
| embeddings = self.convolution(embeddings, training=training) |
There was a problem hiding this comment.
I am assuming you've verified if this is indeed the sequence to follow when matching a torch.nn.Conv2d() with padding defined?
There was a problem hiding this comment.
I have actually used this comment and colab notebook linked.
| name="convolution", | ||
| ) | ||
| # The epsilon and momentum used here are the defaults in torch batch norm layer. | ||
| self.batch_norm = tf.keras.layers.BatchNormalization(epsilon=1e-05, momentum=0.9, name="batch_norm") |
There was a problem hiding this comment.
As per the official docs (https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), these are the defaults:
eps=1e-05, momentum=0.1
There was a problem hiding this comment.
transformers/src/transformers/models/resnet/modeling_tf_resnet.py
Lines 62 to 63 in 7032e02
I have found out that in this repo all the BatchNorm layers had eps=1e-5 and momentum=0.9.
There was a problem hiding this comment.
Also, true. The default momentum clearly isn't 0.9. @amyeroberts any inputs here?
There was a problem hiding this comment.
But did changing the defaults help in the successful cross-loading by any chance?
There was a problem hiding this comment.
It did not help me with anything in specific. I just did what all of the BatchNorm layers used.
There was a problem hiding this comment.
What's the current issue again?
Would be better to have a detailed trace along with the steps to reproduce it somewhere.
There was a problem hiding this comment.
The momentum parameter (rather confusingly) is different for PyTorch and TensorFlow. For momentum x in PyTorch, the equivalent value in TensorFlow is 1 - x.
There was a problem hiding this comment.
My brain will remain dead for today.
| *args, | ||
| **kwargs, |
There was a problem hiding this comment.
Is there a reason why to have separate args and then kwargs? Usually, we only take kwargs here.
There was a problem hiding this comment.
No specific reason. I can change it if you want!
There was a problem hiding this comment.
No, it's like deviating from the standard implementations without reasoning. So.
There was a problem hiding this comment.
from transformers import LevitFeatureExtractor, LevitModel, TFLevitModel
import torch
import tensorflow as tf
import numpy as np
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
feature_extractor = LevitFeatureExtractor.from_pretrained("facebook/levit-128S")
# Models
model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)
model_pt = LevitModel.from_pretrained("facebook/levit-128S")
# Inputs
inputs_tf = feature_extractor(image, return_tensors="tf")
inputs_pt = feature_extractor(image, return_tensors="pt")
# Outputs
outputs_tf = model_tf(**inputs_tf, training=False, output_hidden_states=False)
with torch.no_grad():
outputs_pt = model_pt(**inputs_pt, output_hidden_states=False)
# Assertion
check_tf = outputs_tf.last_hidden_state
check_pt = outputs_pt.last_hidden_state
np.testing.assert_allclose(
check_tf,
check_pt,
rtol=1e-5,
atol=1e-5,
)This creates a 100% mismatch.
There was a problem hiding this comment.
And there's no warning when you're doing model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)? Like certain weights weren't used and so on?
There was a problem hiding this comment.
All the weights are ported properly. The only thing that I see is with batch_norm parameters.
I have also followed this comment in order to see whether the parameters of the batch norm are important to be ported or not.
There was a problem hiding this comment.
Hmm. If the weights have been cross-loaded successfully and the outputs are not being matched it means the intermediate computations are differing.
In this case, I would try to pick a few of the modules from the PT implementation, which could lead to differences in the outputs in the respective TF implementation.
Do you have any such modules in mind?
There was a problem hiding this comment.
The TFLevitResidualLayer would be a good place to start with the inspection.
I am placing my bets on this module due to the use of random module.
There was a problem hiding this comment.
If that's the case, I'd suggest starting your investigation from there. Specifically, ensure each intermediate operation in that module matches with one (PT) another (TF).
sayakpaul
left a comment
There was a problem hiding this comment.
@ariG23498 see if changing the BN defaults fixes your issue.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Still working. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Working! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Adding TensorFlow version of LeViT
This PR adds the TensorFlow version of LeViT.
Before submitting
Pull Request section?
to it if that's the case. Issue linked: Adding TensorFlow port of LeViT #19123
documentation guidelines, and
here are tips on formatting docstrings.